Analyzing Geographic Questions Using Embedding-based Topic Modeling

نویسندگان

چکیده

Recently, open-domain question-answering systems have achieved tremendous progress because of developments in large language models (LLMs), and successfully been applied to (QA) systems, or Chatbots. However, there has little question answering the geographic domain. Existing research domain relies heavily on rule-based semantic parsing approaches using few data. To develop intelligent GeoQA agents, it is crucial build QA upon datasets that reflect real users’ needs regarding studies analyzed questions corpora Microsoft MAchine Reading Comprehension (MS MARCO), comprising real-world user queries from Bing terms structural similarity, which does not discover interests. Therefore, we aimed analyze location-related MS MARCO based group similar into a cluster, utilize results interests Using sentence-embedding-based topic modeling approach cluster semantically questions, obtained could gather documents single cluster. Furthermore, discovered latent topics within collection guide practical relevant questions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Modeling and Classification of Cyberspace Papers Using Text Mining

The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...

متن کامل

Language modeling using PLSA-based topic HMM

In this paper, we propose a PLSA-based language model for sports-related live speech. This model is implemented using a unigram rescaling technique that combines a topic model and an n-gram. In the conventional method, unigram rescaling is performed with a topic distribution estimated from a recognized transcription history. This method can improve the performance, but it cannot express topic t...

متن کامل

Latent Topic Embedding

Topic modeling and word embedding are two important techniques for deriving latent semantics from data. General-purpose topic models typically work in coarse granularity by capturing word co-occurrence at the document/sentence level. In contrast, word embedding models usually work in fine granularity by modeling word co-occurrence within small sliding windows. With the aim of deriving latent se...

متن کامل

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

Geographic Topic Model: Appendix

Faceted topic models combine topical content with extraneous facets, such as ideology or dialect. In this model, the “pure” topics are corrupted by the facets, using a hierarchical generative model in which the pure topics act as priors on the faceted topics. This is most easily modeled using the logistic-normal distribution, which admits a normal prior on the mean. 1 Model We build on latent D...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ISPRS international journal of geo-information

سال: 2023

ISSN: ['2220-9964']

DOI: https://doi.org/10.3390/ijgi12020052